MINIMAX BOUNDS FOR ESTIMATION OF NORMAL MIXTURES 



KYOUNG HEE KIM 

Abstract. This paper deals with minimax rates of convergence for estimation of den- 
sity functions on the real line. The densities are assumed to be location mixtures of 
normals, a global regularity requirement that creates subtle difficulties for the appli- 
cation of standard minimax lower bound methods. Using novel Fourier and Hermite 
polynomial techniques, we determine the minimax optimal rate — slightly larger than 
the parametric rate — under the squared error loss. For the Hellinger loss, we provide a 
minimax lower bound using ideas modified from squared error loss case. 



1. Introduction 

This paper establishes the optimal minimax rate of convergence, under squared 
error loss, for densities that are normal mixtures. The analysis reveals a subtle 
difficulty in the application of Assouad's Lemma to parameter spaces defined by 
indirect regularity conditions, which complicate the usual construction of subsets of 
the parameter space indexed by 'hyper-rectangles'. 

More precisely, we consider independent observations from probability distribu- 
tions Pf on the real line whose densities / (with respect to Lebesgue measure) belong 
to the set of convolutions 



u{x) = j 4>{x- u)du{u) : n G r{R)^ 



where 4> denotes the standard normal N(0,1) density and V{M) denotes the set of 
all probability measures on the (Borel sigma-field of the) real line. The main result 
gives an asymptotic minimax lower bound for estimators /„ based on n independent 
observations from a density / in J^. 
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Theorem 1.1. There exists a positive constant c such that 

sup E„ J / fn{x) - f{x) dx>c-\ogn ■== := c£n 

far J-oo ^ ' nVlogn 

/or every estimator sequence {/n}- 



Zhang] (|l997l ) has established that the sine kernel estimator attains the maximum 



expected loss as of order 0(/„) over a class of normal location mixtures under the 
empirical Bayes setting. More precisely, under the assumption that Yj is a random 



variable with a density f{y\Oi) given 9i for i = 1, ...,n, Theorem 2 in [Zhang (jl997l ) 
proves that there exists constant Bp depending on p only, for which 



(e I {fifHx) - fifHx))'' dx^ \ 



7r(2s + l)n 

where fn\y) = / <P{y — u)dGn{u) for which s > is an integer and := 
(1/n) Yll=i Pi^i — ^) using an estimator 

1 " 

i=l 

where Ka{x) = sin(ax)/7r3; for x 7^ and Ka{x) = a/vr for 2; = 0. Following the 
exact the same proof, we can show the following, 

SUpE„j / (fn{x)-f{x)] dx = 0{ln) 

/eJ" J-oo ^ ' 
with the same sine kernel estimator using an i.i.d random sample Xi, ...,X„ from a 
density f ^ F. 

Thus, Theorem 11.11 combined with this upper bound determines the optimal min- 
imax rate for the problem of estimating a density in the class J-. The minimax rate 
/„ reveals the difficulty of estimating / in in a sense that no estimator can achieve 
faster rate than In in a worst case. That is, estimating a density in T under the 
L2 loss is slightly more difficult than the parametric problem such as estimating a 
normal density with unknown mean. Actuall y, the same diffic ulty has been shown 
for larger classes of functions. For instance, llbragimovl ((20011) proves the optimal 
minimax rate In (in Theorem 4.1) for the class of analytic functions with a certain 
growth condition 

/ : sup|/(x + iy)| < Mexp(c|yp) 
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which includes the class T of normal location mixtures. Moreover, lEfromovichl (120081 . 
Theorem 5.2) proves the sharp minimax risk as (l/(7rn))(log n/(27))-^/^(l + o(l)) for 
the infinitely differentiable class 

|/ : - y y\{u)fdu < Q, h{u) = J /(x)e*"^dx| , 

which contains J-'. This implies that a subclass J- captures the difficulty in the class 
of analytic functions (or infinitely differentiable functions) . 

While minimax result under the L2 loss presents most successful case, this loss is 
often criticized for giving too little weight to errors from the tails. As an alterna- 
tive, we also consider the Hellinger loss. For the following class of normal location 
mixtures 

J's- S^f ■■ f{x)=(f>*U{x) = j <i){x-u)dli{u), nGP,(IR)| 
where Ps(M) is a class of probability measures with sub-Gaussian tails, 
■Ps(M) := {n, nd-ul > t) < Cexp(-ct2) for all real t] 



with constants c and C iGhosal and van der VaartI (|200ll ) provide a sieved maximum 
likelihood estimator whose convergence rate is 0((logn)^/n). However, as they 
pointed out, the optimal rate for Tg is still unknown (to the best of my knowledge, 
there is no lower bound proof under the Hellinger loss) and here we provide one 
possible lower bound. 

Theorem 1.2. There exists a positive constant c such that 

sup En J j {\l fn{x) - \//(x)j dx>c-\ogn 



n 



for every estimator sequence {fn}- 

It is interesting to notice that even if Tg is a subclass of the optimal minimax 
rate under the Hellinger loss is larger than the optimal rate In under the L2 loss. 
The pro ofs of theorems, which are given in Section [21 use a variation on Assouad's 



lemma (cf. IVan der VaartI . 119981 . p. 347). When specialized to density estimation, the 



Lemma can be cast into the following form. (Henceforth we omit the ±00 terminals 
on the integrals when there is no ambiguity.) For completeness, we provide the proof 
in the appendix. 
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Lemma 1.3. Suppose {fa, a E {0, 1}'^} Q J- where K is a finite index set with 
a cardinality m. For positive constants cq and ci(< 1), and for a nonnegative loss 
function W satisfying 

(1) Wif,gi) + W{f,g2)>CW{gug2) for a constant C > 0, 
suppose 

(2) W{faJp)>coe^\\a-(3\\o for all a,/3G{0,l}^ 
and 

(3) fiJa^f^^Sl if ||a-/3||o = l, 

J fa n 

where ||a — /3||o = YlkeKi^k 7^ l^k}, which is the Hamming distance. Then, for 
every estimator fn based on n independent observations, 

(4) supE„jl^(/„,/) > ^(1 - V^)me2. 

Remark 1.1. Assumption ^ regarding the distance is merely a convenient way 
to show that the testing affinity, \\Pj^ AP^||i, is greater than 1 — -^/ci, where PJ is 
a product probability measure under f and \\P f\ Q\\i is defined as J min{dP,dQ). 

Remark 1.2. We try to obtain largest possible me^ for a better lower bound. While 

we construct the finite density class satisfying we need to restrict the size and 

m so that two densities on the nearest edge should be reasonably close as in and 
so that the constructed densities are truly in the parameter space T . 

For the proof in Section [2l we construct /^'s of the form 

= /o(x) + e Y.k^K "'^^'^•(^)' " ^ 1}"^ 

where /o is the normal density function with a zero mean (and variance specified 
later) and K = {1, 3, . . . , 2m— 1}, and m, e, and could change with n. The main 
difficulty lies in choosing the (signed) perturbations so that each is a normal 
location mixture. The natural way around this problem is to construct the Assouad 
hyper-rectangle in the space of mixing distributions, 

/a = 0*na where nQ,(n) = no(n) + eV] ,^afcVfc(n), a G {0, 1}-^ 
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where the signed measures Vk must be chosen so that each IIq, is a probabihty 
measure. In contrast to this standard construction, the indirect from of fa = (p^Ha 
leads to an embedding condition like 

(5) W{4>*UaA*^^)>rnY,k^j^(^k- /3k?, 

for some r„. The right side of ^ is expressed with "^k^xi^k — instead of 
the Hamming distance, in order to emphasize orthogonal relation. Traditionally 
such a property is obtained by choosing the perturbations to be exactly orthog- 
onal to each other, subject to various other regularity properties that define the 
parameter space. The smoothing effect of the convolution operation creates more 
complication to choose the Vk to achieve such near-orthogonality. Nevertheless, we 
achieve ([5]) by choosing the perturbations so that their Fourier transforms are or- 
thogonal as elements in L2((/>^), the space of complex- valued functions g such that 
/ (l){x?\g{x)\'^ dx < CO for the L2 loss. Similarly, we achieve ([5]) under the Hellinger 
loss using the similar ideas under L2 except that 0^ is replaced by other weight 
function. 

2. The proof of Theorems 
First, we introduce some notations used in this section. We let (f)„2 be the normal 



density with mean zero and variance o"^. Following iRudinI ( 19871 . chap. 9), we define 
the Fourier transform T by 

Tfit) := fit) = r eM-i^t)f{x)dx 

for / G >Ci(A) then extend from £1 fl £2 to £2 by the isometry. 

For both theorems, we construct the signed measures to have (signed) densi- 
ties Vk with respect to Lebesgue measure A: 

(6) 7rQ,(u) = ^^^^(m) = 7ro(u) + e akVk{u), a e {0, 1}^ 

where ttq is the normal density with zero mean and each Vk is a function for 
which J Vk = and 

7ro{u) + e'^^^^OkVkiu) > for ah 
We then need to check the assumptions for Lemma ll.3[ 
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2.1. Ideas in the proof of Theorem II. IL Here we let W{f,g) := ||/ — g\\l = 
J{f — 5)^, then ([1]) is satisfied with C = 1/2. The choice of the u^'s is suggested by 
Fourier methods. By the Plancherel formula (and the fact that (p = (p), 

2 



1 1 foo I 

- ml = ^wfa - ml = y ^ Efcex^"'^ - 

which lets us write the desired property ([2]) of Lemma 11.31 as 



dt, 



/oo . 



'dt>^ yZ^^Jc^k - /3kf, V a, /3 G {0, 1}^, 



where we use | |a — /3| |o = YlkeKi'^k — A)^- We might achieve such an inequality by 
choosing the w^'s to make the functions ipkit) ■= 4'{'t)^k{'t) orthogonal. Ignoring other 
requirements for the moment, we could even start from an orthonormal set {ipk} 
then try to define Vj. as the (inverse) Fourier transform of 'ipk{t)/(j){t), provided that 
the ratio is square integra ble. This 
orthogonal functions (see I Jackson . 



leuristic succeeds if we start from the normalized 



2004 . chap. 9), 



(7) 



V2</'(2t) 



for k e K := {1,3, ...,2m - 1} where C = V2{27r f/* is chosen so that Ccpit)"^ = 
Y^20(2t) and Hj.{t) is the Hermite polynomial of order fc, the polynomial for which (f){t) 
has k^^ derivative {—l)^Hk{t)(t){t). 

Remark 2.1. {H^, k = 1,2, ... | is sometimes called the "proba bilists ' Hermite Poly- 
nomials" (denoted as 'He' in I Gradshtevn and Rvzhi'k (MQj)), as opposed to the 
"physicists' Hermite Polynomials" H. There is one-to-one relation between H and 



H, 



H,{t) = 2-^'/2h, [1^ 



To calculate the Fourier inverse transform of ipk{t) / <p{t)^ we provide the following 
lemma. 

Lemma 2.1. For b > a > 0, 



(8) 



[4>{at)Hk{bt)] {u) = Qkck ( - ) Hkib'u) 



where Qk = {ica,b)^ /o- with Cafi = \Jb'^ /a^ — 1 and b' = b/{a'^Ca^b)- 
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Remark 2.2. Lemma \2. 1\ illustrates a general form of the eigenvalue-eigenfunction 
relation for the Fourier transform of Hermite functions, 



T[(t>{t)Hk{V2t)]{u) = {-if<P{u)Hk{V2u). 



( See ( 7 . 376) i n I Gradshtevn and RvzhiM II 20 01 ). or for more details, see §4.11 in 
Kawatd \l97h ). 



Proof of Theorem \l.l\ By Lemma |2. 11 this choice for the ipk^ in © leads to 



(9) 



/3fc / 2 

v^{u) = Cy —(j){u)Hk y—^^ 



iov k £ K 



because T~'^[(j){t)Hk{2t)]{u) = i^^^''^(t){u)Hk{2u/ V^). By restricting to odd values 
of k we make the v^s real- valued and odd, thereby ensuring that J v^dX = and 
J TTadX = 1 for each a in {0, 1}^. 

In summary, the choice of vt as in ([9]) gives 

(10) i^\\fc.-fp\\l = ^^ j ^{X^^^j,('^k-h)Mt)) dt = e'^^^^iak-(3k?. 

That is, the first condition ([2]) is satisfied with cq = 27r. 

We still need to check the second condition and also show that e can be 
chosen small enough to make all the vr^'s nonnegative. Actually, we first show that 

> '?i"o/2 > by choosing 



(11) 



< l3-m+l/2^-3/2 



16 



and by choosing tto as a normal density with zero mean and variance m. Secondly, 
we find out the largest size m of these hypercubes while the two densities fa and 
ffs are close in terms of the distance as 0(l/n) when there is only one different 
coordinate between a and (3. 

To control the denominator in ([3]), w e first show that < Ck\/ rnTTn(u) where 

Cfc = 8 • 3*^/^. By Cramer's inequality (jGradshtevn and Rvzhikl . boOTl . eqn. 8.954), 



(12) 



Hkiu)\ < KVk\exp{uy4) with k f« 1.086. 
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Applying this inequahty to 



(13) \vk{u)\ < kC3^/2^ exp (-luA < Ck^{u/V3) 

(14) < Ck(p{u/^/m) = CkVrniToiu). 



Using (fn|) . we have 



1 _ 8 . 3^-1/2^3/2 



by the choice of e in (jlip . 

Hence, under the condition (jlip . 0*11^ := /q, > /o/2 := (^*no/2, which imphes 
that the second condition in Lemma 11.31 is rewritten as /(/a — fpf'/fo < ci/2n 
for a and /3 having only one different coordinate. The denominator /o = * Ho 
is again normally distributed with mean zero and variance 1 + m by the choice of 
dHo/dX := ttq = A^(0,m) density. 

For convenience, we let ai ^ /3i (all the other cases work the same way). By 
splitting the integral into two regions |x| < M-^/m and |x| > M^Jm with a con- 
stant Af = 8 log 9, 

{fa-fp?_ f {fa-fp)\^2[ U<t^{x-U)v,{u)d\y 



fo J\x\<M^ fo J\x\>M^ fo 

For the first integral, the denominator is lower bounded under the interval < 
M^/m}, such as foix){\x\ < M^} > exp(-MV2) 7(2^2^^) := 1/{C*^/TE) 
where C* := 2\/27r exp(M2/2). Then, using the £2 loss calculation from (fTO]) . 



\x\<M^ fo{x) 

For the second integral, recall that for any /c = 1, 3, 2m— 1, |i'fc(ti)| < C2m-i4'{'^/V^) 
C2m-i 0-000-2 with (To = as in ([T3]). Using C2m-i ■= 8 • 3*^-1/2 and (l)i_^^2{x) < 
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/mcpi^rni^) , with a notation R{x) := {|x| > M^/m}, 

2 f {JcP{x-u)v,{u)dX)\ , f (/<A(x-n)0,g(n)dA)' 

e / ^ — '—dx < € C2„_iao / ^ dx 

Jr{x) mx) Jr(^^) (pi+m{x) 



Jr{x) 

= (^A^A (32™! <P,^„.{x)dx 



^ 64 2 
< — yme 



where the last inqeuahty is obtained by the Gaussian tail property with ^/m ^ 
(To := \/3, 

^ -2m 



I (l)^^^2 {x)dx < exp ( --^M^m ) = 3" 



by choosing = 8 log 9. 

Combining these two upper bounds for the integral, we obtain 

(fa - fpf . /— 2 r^* , aA lo\ _ '^1 



/ ^ < V^e^ {27tC* + 64/3) 

J JO 



2n 



as long as 



(15) ^/rne^ < - 



2^1 ci 



n 2(27rC* + 64/3) ' 

As a consequence, the constructed mixing densities fulfill two requirements in 
Assouad's lemma under conditions ([TT]) and (fT5]) . 

^2 < f 13-2^+1^-3 1 Ci 



162 ' nV^2(27rC* + 64/3) 

Following the simplified Assouad's lemma [L3t the lower bound is obtained as ce^m, 
which is at most min(3~^™'m~^, -y/m/n) up to a constant. To find the largest me^, 
by equating 3^^™?7i^^ = ^/m/n, we obtain m and as logn and l/{ny/log n) 
respectively up to a constant, and hence the lower bound is obtained as \/log n/n 
up to a constant. □ 
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2.2. Ideas in the proof of Theorem [ESI Here we let W{f,g) ■= \\y/J - = 
J (\/7- Vs)^ t^^^ © is satisfied with ( = I. First we relate the Hellinger distance 
and the distance. That is, suppose we can show (l/2)7ro(u) < iTaiu) < (3/2)7ro(n) 
which in turn says (l/2)/o(x) < < (3/2)/o(x) by convolving the standard 

normal density. Then, using the upper bound for and fp, 



2 



{fa - fp? ^1 [{fa- fp) 



(16) Wfa-^f,] = -^^-^> 



(^/^ + v^)2-6y /o 

Similarly, the lower bound for would give an upper bound for the testing 
condition 

^^^-^ f {fa - flif ^2 [ {f<^~ fp)"^ 



fa J fo 

Thus it would be enough to work with the following quantity 

I ^nvjirTrJ J IS'^*"- - j • 

where the second equality is given by dH). 

At first glance, f{fa — /^)^//o does not look amenable to Fourier techniques. 
However, as Lemma [2.21 shows, (j) * Vk/ y/fo is expressed as convolution of normal 
(with a variance larger than 1) with a certain choice of the perturbation function Vk 
and base function ttq = 4'^2 . 

Lemma 2.2. Consider the perturbation functions 

Vk{u) = ^(l){pu)Hk{ju), p^>^ + ^, 
where Ct is a constant depending on k and 7 > 0. Then 

where 

Ck 

(19) Vk{u) := -j=4>{pu)Hk{^u), 

Vkl 

with 

(20) a =1 + ^^^^, C. = C.^— , p= , 7 = ^. 

By Lemma 12.21 the denominator effect can be incorporated into the normal con- 
volution. Then we follow similar ideas used in the proof of Theorem II. 1[ 
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Proof of Theorem \1.2l Again, the choice of f^'s is suggested by Fourier methods. 
For convenience, we let ttq = 0, then /o = (f>2 and vTo = 2^^^tt^^^4>4- Assuming Vk 
in ([T2D are in C21 



T 



^0 



it) 



Tiy/Jolit) + eV!, ,akT(t)±{t)Tvk{t) by Lemma 



By the Plancherel formula, 

1 II /a II 2 

2 = e 



which lets us write the first condition ^ in Assouad's Lemma 11.31 as 



/ 



— ^kaK 3 vr — ^kaAi 

Similar to the case for the squared error loss, we might achieve even an equality 
with Co = 7r/3 by choosing VkS to make the functions i)k{t) '■= T[(j)±\{t)T[vk\{t) 
orthonormal. Ignoring other requirements, we also start from the same orthonormal 
set d?]), and then try to define Vk as the inverse Fourier transform. 
From the fact that 

r[04](t) = -^exp(-|t2) 
3 V 27r <J 



and by definition of Vk in (|T9|) , the requirement equals to 



(21) 

If we find out all the parameters to make (I2ip true, we have the desired property 
for the loss separation condition, i.e. we have 

ifa - fp? 



(22) 



Thus p, 7 and Ck are found satisfying (j2ip . After some calculations, (j2ip equals to 



T[<i>{pu)Hk{ium 



which leads to the following. 



,^/2(27r)3/4 

( 

Ck 



t\ 



Hk{2t) 



^ml-)Hk{2t) 



(u) 



Ck 



V2(27r)3/4 



(j){pu)Hk{^u). 
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Recall Lemma l2. 11 i.e. for 6 > a > 0, 



\k 



a a CL'^Ca^b V 
Plugging in a = y^2/3 and 6 = 2 into the above expression, we have the following 
solutions, 

(23) Ck = (27^)3/4V3^/5^, p = y|, ^ = A. 



We need to ensure that the choice of cr^ = 1 satisfies the inequality > ^ + \ 
needed for the Lemma [2121 Comparing ([20]) and (f23]) . we obtain = 3 and 7 = 
which satisfy the condition. Also, is obtained as = {2^/'^^/^T)^/b^ . 

Therefore, this choice for the ■i/'/c's leads to 



(24) Vk 



(n) = 2^/^^^^<i){V2,u)Hk (^^) for k G K. 



By restricting to odd values of /c, we make the f^'s real- valued and odd, thereby 
ensuring that J Vj.d\ = 0. 

Using the exactly same idea in the previous section, if 

(25) e < i with k ~ 1.086, 

2KmC2m-l 



then 



1 3 

-7ro(M) < ■Ka{u) < -vro(ii) for all n G M, a € {0, 1}^. 



Now the second testing condition can be treated straightforwardly. Indeed, once 
we choose an orthonormal function V'fc's, we obtain 

j {fa - fp? ^ I ^ Ua- fp? ^ ^^^2 _ ^11^ = 1 by (dZl) and 

J Ja J JO 

Thus it is enough to choose < l/(47m). With our choice e = l/(4-y/n), the testing 
condition is satisfied. 

From the lower bound me^, we want to choose m as large as possible. The 
condition in (125p restricts the size of m, 

2KmC2m-i < emS™ < 

Thus, we have the upper bound for m, 

m < (1/(2 log 5)) log n ~ (0.31) log n. 
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Finally, we check these constructed vr^'s are inside of the parameter space 'Ps(M) 
From the fact that TTa{u) < (3/2)7ro(u) for all n G M and a G {0, 1}'^^, it is clear that 
ttq is in the space 'Ps(ffi) from the tail property of normal density. 

Consequently, the lower bound is obtained as log n/n up to a constant. 

□ 



3. Discussion 



It h as been claimed that the Fano's method is more general in a sense (see 



Yu 



(119971. p. 428)) . Indeed, using Varshamov- Gilbert's lemma (e.g. Lemma 2.9 in 
Tsvbakovl (|2009l )). it is not very difficult to prove the same rate result for the class 



of normal location mixtures with similar types of the sub parameter space. 

However, Assouad's method seems more convenient in some cases. For instance, 
before knowing how to construct the subspace, it would be extremely difficult to 
figure out the right family of densities when there are only indirect regularity con- 
ditions as in this example. Assouad's hyperrectangle method indicates that the 
problem can be solved if we can show the orthogonal relations between constructed 
densities. Specific constructions can cause another difficulty, but we at least have 
some clues to handle these problems. 

On the other hand, in case we know the metric entropy (good packing and cover- 
ing number bounds) results beforehand, the optimal minimax rates can be obtained 
almost a utomatically with predic tive Bayes density estimator using the main the- 



orems m 



Yang and BarronI (|l999l ). It will be interesting to see if we can calculate 



sharp er metric entropy for or J^g than one appeared in 

pooil ). 



Ghosal and van der Vaart 



4. Appendix 



Proof of Lemma \l-3[ Mo st of the proof is based on ideas borrowed from 



Le Cam 



(|l973l ). iTsvbakovl (|2009l ). and some unpublished notes by David Pollard. Denote 
A = {0, 1}^ and for convenience denote Eq, for E and Pq for P where Pj^ = Pj?" . 
For any density estimator based on the observation Xi, ...,Xn, define an estimator 



a = arg mill ly (/„,/„). 
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By restricting the parameter space and by the definition of a, 
supE fW{Lf) > maxEaWifnJa) 

> i maxE, (W{fn, fa) + W{fn, fa) 
2 aeA \ 

> ^maxEaW{faJa) 

using pseudo distance property ([T|). Now, using the first condition ([2]) in the Lemma 
fohowed by the simple fact that the supremum is bounded by the average, the last 
equation can be lower bounded by 

max^EQl{afc / Ofc} > ^ Y / "A:}- 



2 aGA^ " ' ' - 2 2*^ 

fc=l aeA fc=l 



Define 



E ^1.^ = ^ E ^ = i,->"^ 



wehre Ai^k = {a £ A : ak = i} ioic i = 0, 1. 
Since a/cOfc can take only and 1 values, 

1 ™ 1 ™ / \ 

aeAfc=l fc=l \a6Ao,fc QSAi^fe / 

= 2 E (^o,a{afc / 0} + Pi,a{afc / 1}) , 

k=l 

which gives us the following lower bound 

2 ^ 

SUpEfW{fn-f) >^y^\\n,k/\fl,k\\l 

by Ph + Q{l -h)> ||P AQ||i for /i > with h = l{a ^ 0}. 

For k = m, each a in j4o,m is of the form (7, 0) with 7 G D := {0, 1}™^^. Similarly, 
each a in Ai^m is of the form (7, 1) with 7 E D. The affinity between IPo,m and Pi^m, 
equals 
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Note that (7, 0) and (7, 1) have only one different coordinate. By similar calculations 
for other k's, we obtain 



SUpE/ {fn-fY> 



2 ^ coe^C, 



-m mm 

2 d{a,/3)=l 



"a AP^IIl. 



In general, it is difficult to calculate the testing affinity exactly. Fortunately, 
convenient lower bound can be used in terms of distances between marginals when 
Pq, and are both product measures. For instance, when Pq, = for i.i.d. case, 
we can bound this using the chi-squared distance by the following relation. 

{l-\\F^AF^\\if <nx^ (Pa, P^) ■.= n 
Thus the second condition ([3]) in the Lemma yields a lower bound 



-m(l 



See 



Tsvbakovl ([20091, Lemma 2.7 on page 90) or lLe CamI (|l973l . Lemma 1 on page 



40) for the derivation of facts about relations between distances. 

Proof of Lemma \2.1\ For 6 > a > 0, 

4>{at) eMbtx - \x^) = Hat) ^^^^ 
Now, do the inverse Fourier transform of the left side of the above expression. 



□ 



T 



-1 



(j){at) exp{btx — 2^^) 



(u) 



exp{itu) 



00 
1 



av 27r 



2tt 



exp 



exp(- 



^ +bxt- -x^)dt 



{bx + in)^ 1 



2a2 



-X 

2 



= -(/)(-) exp 
a a 

a a 

The inverse Fourier transform of the right side is 



/ bxui 

Hk {b/{a^Ca,b) u) 



2 l*2;Ca,fej 



00 
fc=0 



A;! 



{iCa,b) X 



<j){at) 



Hkibt) 



kl 



{u)x^ 



By matching the coefficient for the fc*'' power of x, 



a a 



u 
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which proves the claim. □ 

Proof of Lemma \2.2[ Main ideas are just completing the square and change of vari- 
ables. Note that (p-k(p^2 = (j)ij^^2. By definition of Vk, we have 

[4) -k Vk{u)]{x) ^ Ck [(t)-k(t){pu)Hk{nu)]{x) 

Ck exp(-i(x - uf)^eM-y^u^)Hk{lu) 



klj (2^i + ^2))-i/4exp(-i^ 



2 . 



du 



Ck 



1 + a2)V4(2^)-3/4 J ^^^^(^^ u)Hk{ju)dz 



where E(^^p{x,u) is the exponential factor. By completing the square, 

E^^p{x, u) := exp ^(-^ + |^^^ )x^ + xu-{^ + ^^^'''"^) 

= exp f \rx'^ + XU- {- + -p^)u'^] by def. of in ([20]) 

y 2cj^ 2 2 y 

= exp ( -^(x - tt) I exp I --(1 + p -a- )^ I hy u:=d^u 



= (2vrcr)(/)5.2(x - u)(t) I u I , 

where the positive value for [1 + — a'^) is guaranteed by the condition > 
1/(7^ + 7^/2 > 1/(1 + 2(7^) := 1 - fj^. By the change of variables, 



~2 « \ ~2 



Using the definitions of each transformed variables (j20p . the proof is complete. □ 
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